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Amendments to the Claims: 

This listing of claims will replace all prior versions and listings of claims in the application: 
Listing of Claims: 

1. (original) A programmable processor comprising: 
an instruction path; 

a data path; 

an external interface operable to receive data from an external source and communicate 
the received data over the data path; 

a cache operable to retain data communicated between the external interface and the data 

path; 

a register file operable to receive and store data from the data path and communicate the 
stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a data selection operand and a first and a second register each 
having a register width, the first and second registers providing a plurality of data elements each 
having an elemental width smaller than the register width of the first and second registers, the 
data selection operand comprising a plurality of fields each selecting one of the plurality of data 
elements, the execution unit is operable to provide the data element selected by each field of the 
data selection operand to a predetermined position in a catenated result, 

2. (original) The processor of claim 1 wherein each field of the data selection operand 
provides a sufficient number of bits to specify any one of the plurality of data elements. 

3. (original) The processor of claim 2 wherein each field of the data selection operand 
has a width of n bits wherein the plurality of data elements comprises 2.sup.n data elements. 
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4. (original) The processor of claim 1 wherein the data selection operand is provided by 
a register specified by the single instruction. 

5. (original) The processor of claim 4 wherein the data selection operand has a width 
equal to the specified register width. 

6. (original) The processor of claim 1 wherein the catenated result is provided to a 
register. 

7. (original) The processor of claim 1 wherein the plurality of data elements has a 
combined width equal to the width of the first register plus the width of the second register. 

8. (original) The processor of claim 1 wherein the instruction further specifies a data 
element width of the plurality of data elements. 

9. (original) The processor of claim 1 wherein each data element has a width of 8 bits. 

10. (original) The processor of claim 1 wherein the catenated result has a width of 128 

bits. 

1 1 . (original) The processor of claim 1 wherein for each field of the data selection 
operand, a relative location of the field within the data selection operand corresponds to a 
relative location of the predetermined position within the catenated result. 

12. (original) The processor of claim 1 wherein the execution unit is further operable to, 
in response to decoding a second single instruction specifying a third and a fourth register each 
containing a plurality of operands, multiply the plurality of floating point operands in the third 
register by the plurality of operands in the fourth register to produce a plurality of products and 
provide the plurality of products to partitioned fields of a result register as a second catenated 
result. 

13. (original) A programmable processor comprising: 
an instruction path; 
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a data path; 

an external interface operable to receive data from an external source and communicate 
the received data over the data path; 

a cache operable to retain data communicated between the external interface and the data 

path; 

a register file operable to receive and store data from the data path and communicate the 
stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a data selection operand and a register having a register width, the 
register providing a plurality of data elements each having an elemental width smaller than the 
register width of the register, the data selection operand comprising a plurality of fields each 
selecting one of the plurality of data elements, the execution unit is operable to provide the data 
element selected by each field of the data selection operand to a predetermined position in a 
catenated result. 

14. (original) A data processing system comprising: 

(a) a bus coupling components in the data processing system; 

(b) an external memory coupled to the bus; 

(c) a programmable microprocessor coupled to the bus and capable of operation 
independent of another host processor, the microprocessor comprising: 

an instruction path; 
a data path; 

an external interface operable to receive data from an external source and 
communicate the received data over the data path; 

a cache operable to retain data communicated between the external interface and 

the data path; 

a register file operable to receive and store data from the data path and 
communicate the stored data to the data path; and 
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an execution unit coupled to the instruction path and the data path and operable to 
decode and execute instructions received from the instruction path, wherein in response to 
decoding a single instruction specifying a data selection operand and a first and a second register 
each having a register width, the first and second registers providing a plurality of data elements 
each having an elemental width smaller than the register width of the first and second registers, 
the data selection operand comprising a plurality of fields each selecting one of the plurality of 
data elements, the execution unit is operable to provide the data element selected by each field of 
the data selection operand to a predetermined position in a catenated result. 

15. (original) The system of claim 14 wherein each field of the data selection operand 
provides a sufficient number of bits to specify any one of the plurality of data elements. 

16. (original) The system of claim 15 wherein each field of the data selection operand 
has a width of n bits wherein the plurality of data elements comprises 2.sup.n data elements. 

17. (original) The system of claim 14 wherein the data selection operand is provided by 
a register specified by the single instruction. 

18. (original) The system of claim 17 wherein the data selection operand has a width 
equal to the specified register width. 

19. (original) The system of claim 14 wherein the catenated result is provided to a 
register. 

20. (original) The system of claim 14 wherein the plurality of data elements has a 
combined width equal to the width of the first register plus the width of the second register. 

21. (original) The system of claim 14 wherein the instruction further specifies a data 
element width of the plurality of data elements. 

22. (original) The system of claim 14 wherein each data element has a width of 8 bits. 
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23. (original) The system of claim 14 wherein the catenated result has a width of 128 

bits. 

24. (original) The system of claim 14 wherein for each field of the data selection 
operand, a relative location of the field within the data selection operand corresponds to a 
relative location of the predetermined position within the catenated result. 

25. (original) The system of claim 14 wherein the execution unit is further operable to, 
in response to decoding a second single instruction specifying a third and a fourth register each 
containing a plurality of operands, multiply the plurality of floating point operands in the third 
register by the plurality of operands in the fourth register to produce a plurality of products and 
provide the plurality of products to partitioned fields of a result register as a second catenated 
result. 

26. (original) A data processing system comprising: 

(a) a bus coupling components in the data processing system; 

(b) an external memory coupled to the bus; 

(c) a programmable microprocessor coupled to the bus and capable of operation 
independent of another host processor, the microprocessor comprising: 

an instruction path; 
a data path; 

an external interface operable to receive data from an external source and 
communicate the received data over the data path; 

a cache operable to retain data communicated between the external interface and 

the data path; 

a register file operable to receive and store data from the data path and 
communicate the stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to 
decode and execute instructions received from the instruction path, wherein in response to 
decoding a single instruction specifying a data selection operand and a register having a register 
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width, the register providing a plurality of data elements each having an elemental width smaller 
than the register width of the register, the data selection operand comprising a plurality of fields 
each selecting one of the plurality of data elements, the execution unit is operable to provide the 
data element selected by each field of the data selection operand to a predetermined position in a 
catenated result. 

27. (new) A programmable processor comprising: 
an instruction path; 

a data path; 

a plurality of registers operable to receive and store data from the data path and 
communicate the stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a plurality of registers storing a plurality of 8-bit data elements, an 
index register storing an index vector comprising a plurality of equal-sized selectors stored in 
partitioned fields of the index register and a destination register, the execution unit is operable to, 
for each selector in the index vector, provide a data element selected by the selector to a 
predetermined position in the destination register. 

28. (new) The programmable processor set forth in claim 27 wherein the plurality of 
registers comprises two registers. 

29. (new) The programmable processor set forth in claim 27 wherein the plurality of 
registers comprises two 64-bit registers storing a combined total of sixteen 8-bit data elements. 

30. (new) The programmable processor set forth in claim 27 wherein the number of 
selectors stored in the index register is equal to the number of predetermined positions in the 
destination register. 

31. (new) The programmable processor set forth in claim 27 wherein the index register 
is a 64-bit register. 
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32. (new) The programmable processor set forth in claim 27 wherein the index vector 
comprises n equal-sized selectors and the destination register comprises n equal-sized 
predetermined positions. 

33. (new) The programmable processor set forth in claim 32 wherein the selector stored 
in a lowest order set of bits of the index register provides a data element to a lowest order set of 
bits of the Destination register, the selector in a second lowest order set of bits of the index 
register provide a data element to a second lowest order set of bits of the destination register and 
the selector stored in a highest order set of bits of the index register provides a data element to a 
highest order set of bits of the destination register. 

34. (new) The programmable processor set forth in claim 27 wherein the destination 
register is a 128-bit register. 

35. (new) The programmable processor set forth in claim 27 wherein each of the equal- 
sized selectors stored in partitioned fields of the index register is a 4-bit selector. 

36. (new) The programmable processor set forth in claim 27 wherein the index register 
stores sixteen 4-bit selectors. 

37. (new) A programmable processor comprising: 
an instruction path; 

a data path; 

an external interface operable to receive data from an external source and communicate 
the received data over the data path; 

a cache operable to retain data communicated between the external interface and the data 

path; 

a plurality of registers operable to receive and store data from the data path and 
communicate the stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
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single instruction specifying a first register storing a first plurality of 8-bit data elements, a 
second register storing a second plurality of 8-bit data elements, an index register storing an 
index vector comprising a plurality of equal-sized selectors stored in partitioned fields of the 
index register and a destination register, the execution unit is operable to, for each selector in the 
index vector, provide a data element from one of the first or second plurality of 8-bit data 
elements selected by the selector to a predetermined 8-bit position in the destination register, 
wherein the predetermined positions are contiguous blocks of bits that take up an entire width of 
the destination register. 

38. (new) The programmable processor set forth in claim 37 wherein the first and 
second registers are 64-bit registers, the index register is a 64-bit register and each selector stored 
in the index register has a sufficient number of bits to select anyone of the 8-bit data elements in 
the first or second pluralities of 8-bit data elements. 

39. (new) The programmable processor set forth in claim 37 wherein the destination 
register is a 128-bit register. 

40. (new) The programmable processor set forth in claim 37 wherein each of the equal- 
sized selectors stored in partitioned fields of the index register is a 4-bit selector. 

41 . (new) A device having installed therein a programmable processor, the 
programmable processor comprising: 

an instruction path; 
a data path; 

a plurality of registers operable to receive and store data from the data path and 
communicate the stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a plurality of registers storing a plurality of 8-bit data elements, an 
index register storing an index vector comprising a plurality of equal-sized selectors stored in 
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partitioned fields of the index register and a destination register, the execution unit is operable to, 
for each selector in the index vector, provide a data element selected by the selector to a 
predetermined position in the destination register. 

42. (new) The device set forth in claim 41 wherein the plurality of registers comprises 
two registers. 

43. (new) The device set forth in claim 41 wherein the plurality of registers comprises 
two 64-bit registers storing a combined total of sixteen 8-bit data elements. 

44. (new) The device set forth in claim 41 wherein the number of selectors stored in the 
index register is equal to the number of predetermined positions in the destination register. 

45. (new) The device set forth in claim 41 wherein the index register is a 64-bit register. 

46. (new) The device set forth in claim 41 wherein the index vector comprises n equal- 
sized selectors and the destination register comprises n equal-sized predetermined positions. 

47. (new) The device set forth in claim 46 wherein the selector stored in a lowest order 
set of bits of the index register provides a data element to a lowest order set of bits of the 
Destination register, the selector in a second lowest order set of bits of the index register provide 
a data element to a second lowest order set of bits of the destination register and the selector 
stored in a highest order set of bits of the index register provides a data element to a highest 
order set of bits of the destination register. 

48. (new) The device set forth in claim 41 wherein the destination register is a 128-bit 
register. 

49. (new) The device set forth in claim 41 wherein each of the equal-sized selectors 
stored in partitioned fields of the index register is a 4-bit selector. 

50. (new) The device set forth in claim 41 wherein the index register stores sixteen 4-bit 
selectors. 
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51. (new) A device having installed therein a programmable processor, the 
programmable processor comprising: 

an instruction path; 
a data path; 

an external interface operable to receive data from an external source and communicate 
the received data over the data path; 

a cache operable to retain data communicated between the external interface and the data 

path; 

a plurality of registers operable to receive and store data from the data path and 
communicate the stored data to the data path; and 

an execution unit coupled to the instruction path and the data path and operable to decode 
and execute instructions received from the instruction path, wherein in response to decoding a 
single instruction specifying a first register storing a first plurality of 8-bit data elements, a 
second register storing a second plurality of 8-bit data elements, an index register storing an 
index vector comprising a plurality of equal-sized selectors stored in partitioned fields of the 
index register and a destination register, the execution unit is operable to, for each selector in the 
index vector, provide a data element from one of the first or second plurality of 8-bit data 
elements selected by the selector to a predetermined 8-bit position in the destination register, 
wherein the predetermined positions are contiguous blocks of bits that take up an entire width of 
the destination register. 

52. (new) The device set forth in claim 51 wherein the first and second registers are 64- 
bit registers, the index register is a 64-bit register and each selector stored in the index register 
has a sufficient number of bits to select anyone of the 8-bit data elements in the first or second 
pluralities of 8-bit data elements. 

53. (new) The device set forth in claim 51 wherein the destination register is a 128-bit 
register. 
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54. (new) The device set forth in claim 51 wherein each of the equal-sized selectors 
stored in partitioned fields of the index register is a 4-bit selector. 
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